Search CORE

74 research outputs found

Unsupervised learning of clutter-resistant visual representations from natural videos

Author: Leibo Joel Z.
Liao Qianli
Poggio Tomaso
Publication venue
Publication date: 23/04/2015
Field of study

Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning rules are not known, recent results [4, 5, 6] suggest the operation of an unsupervised temporal-association-based method e.g., Foldiak's trace rule [7]. Such methods exploit the temporal continuity of the visual world by assuming that visual experience over short timescales will tend to have invariant identity content. Thus, by associating representations of frames from nearby times, a representation that tolerates whatever transformations occurred in the video may be achieved. Many previous studies verified that such rules can work in simple situations without background clutter, but the presence of visual clutter has remained problematic for this approach. Here we show that temporal association based on large class-specific filters (templates) avoids the problem of clutter. Our system learns in an unsupervised way from natural videos gathered from the internet, and is able to perform a difficult unconstrained face recognition task on natural images: Labeled Faces in the Wild [8]

arXiv.org e-Print Archive

DSpace@MIT

Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex

Author: Liao Qianli
Poggio Tomaso
Publication venue: Center for Brains, Minds and Machines (CBMM), arXiv
Publication date: 12/04/2016
Field of study

We discuss relations between Residual Networks (ResNet), Recurrent Neural Networks (RNNs) and the primate visual cortex. We begin with the observation that a shallow RNN is exactly equivalent to a very deep ResNet with weight sharing among the layers. A direct implementation of such a RNN, although having orders of magnitude fewer parameters, leads to a performance similar to the corresponding ResNet. We propose 1) a generalization of both RNN and ResNet architectures and 2) the conjecture that a class of moderately deep RNNs is a biologically-plausible model of the ventral stream in visual cortex. We demonstrate the effectiveness of the architectures by testing them on the CIFAR-10 dataset.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF – 1231216

DSpace@MIT

Human-like Learning: A Research Proposal

Author: Liao Qianli
Poggio Tomaso
Publication venue
Publication date: 28/09/2017
Field of study

We propose Human-like Learning, a new machine learning paradigm aiming at training generalist AI systems in a human-like manner with a focus on human-unique skills.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216

DSpace@MIT

Exact Equivariance, Disentanglement and Invariance of Transformations

Author: Liao Qianli
Poggio Tomaso
Publication venue
Publication date: 31/12/2017
Field of study

Invariance, equivariance and disentanglement of transformations are important topics in the field of representation learning. Previous models like Variational Autoencoder [1] and Generative Adversarial Networks [2] attempted to learn disentangled representations from data with different levels of successes. Convolutional Neural Networks are approximately equivariant and invariant (if pooling is performed) to input translations. In this report, we argue that the recently proposed Object-Oriented Learning framework [3] offers a new solution to the problem of Equivariance, Invariance and Disentanglement: it systematically factors out common transformations like translation and rotation in inputs and achieves “exact equivariance” to these transformations — that is, when the input is translated and/or rotated by some amount, the output and all intermediate representations of the network are also translated and rotated by exactly the same amount. The transformations are “exactly disentangled” in the sense that the translations and rotations can be read out directly from a few known variables of the system without any approximation. Invariance can be achieved by reading other variables that are known not to be affected by the transformations. No learning is needed to achieve these properties. Exact equivariance and disentanglement are useful properties that augment the expressive power of neural networks. We believe it will enable new applications including but not limited to precise visual localization of objects and measuring of motion and angles.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216

DSpace@MIT

3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation

Author: Liao Qianli
Poggio Tomaso
Publication venue
Publication date: 31/12/2017
Field of study

We provide more detailed explanation of the ideas behind a recent paper on “Object-Oriented Deep Learning” [1] and extend it to handle 3D inputs/outputs. Similar to [1], every layer of the system takes in a list of “objects/symbols”, processes it and outputs another list of objects/symbols. In this report, the properties of the objects/symbols are extended to contain 3D information — including 3D orientations (i.e., rotation quaternion or yaw, pitch and roll) and one extra coordinate dimension (z-axis or depth). The resultant model is a novel end-to-end interpretable 3D representation that systematically factors out common 3D transformations such as translation and 3D rotation. As first proposed by [1] and discussed in more detail in [2], it offers a “symbolic disentanglement” solution to the problem of transformation invariance/equivariance. To demonstrate the effectiveness of the model, we show that it can achieve perfect performance on the task of 3D invariant recognition by training on one rotation of a 3D object and test it on 3D rotations (i.e., at arbitrary angles of yaw, pitch and roll). Furthermore, in a more realistic case where depth information is not given (similar to viewpoint invariant object recognition from 2D vision) our model generalizes reasonably well to novel viewpoints while ConvNets fail to generalize.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216

DSpace@MIT

Object-Oriented Deep Learning

Author: Liao Qianli
Poggio Tomaso
Publication venue: Center for Brains, Minds and Machines (CBMM)
Publication date: 31/10/2017
Field of study

We investigate an unconventional direction of research that aims at converting neural networks, a class of distributed, connectionist, sub-symbolic models into a symbolic level with the ultimate goal of achieving AI interpretability and safety. To that end, we propose Object-Oriented Deep Learning, a novel computational paradigm of deep learning that adopts interpretable “objects/symbols” as a basic representational atom instead of N-dimensional tensors (as in traditional “feature-oriented” deep learning). For visual processing, each “object/symbol” can explicitly package common properties of visual objects like its position, pose, scale, probability of being an object, pointers to parts, etc., providing a full spectrum of interpretable visual knowledge throughout all layers. It achieves a form of “symbolic disentanglement”, offering one solution to the important problem of disentangled representations and invariance. Basic computations of the network include predicting high-level objects and their properties from low-level objects and binding/aggregating relevant objects together. These computations operate at a more fundamental level than convolutions, capturing convolution as a special case while being significantly more general than it. All operations are executed in an input-driven fashion, thus sparsity and dynamic computation per sample are naturally supported, complementing recent popular ideas of dynamic networks and may enable new types of hardware accelerations. We experimentally show on CIFAR-10 that it can perform flexible visual processing, rivaling the performance of ConvNet, but without using any convolution. Furthermore, it can generalize to novel rotations of images that it was not trained for.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF – 1231216

DSpace@MIT

Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review

Author: Liao Qianli
Mhaskar Hrushikesh
Miranda Brando
Poggio Tomaso
Rosasco Lorenzo
Publication venue
Publication date: 01/01/2017
Field of study

The paper characterizes classes of functions for which deep learning can be exponentially better than shallow learning. Deep convolutional networks are a special case of these conditions, though weight sharing is not the main reason for their exponential advantage

arXiv.org e-Print Archive

DSpace@MIT

Caltech Authors

Archivio istituzionale della ricerca - Università di Genova